{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# GBDT(Gradient Boosting Decision Tree) 简介\n",
    "\n",
    "## 背景知识\n",
    "\n",
    "- 决策树\n",
    "\n",
    "- 决策树构造原理\n",
    "    - CART生成\n",
    "    - 回归树：均方误差最小\n",
    "    - 分类树：基尼系数\n",
    "    \n",
    "### 回归树\n",
    " \n",
    "#### 回归树的函数表示\n",
    " \n",
    " $$f(x) = \\sum_{m=1}^M c_mI(x \\in R_m)$$\n",
    " \n",
    " M- 叶子节点数\n",
    " \n",
    " x 属于 Rm则 I为1， 否则为0\n",
    " \n",
    " #### 最优特征选取\n",
    " \n",
    " $$min_{j,s}[min_{c1} \\sum_{x_1 \\in Ri} (y_i - c_1)^2 + min_{c2} \\sum_{x_1 \\in R_2}(y_i - c_2)^2]$$\n",
    " \n",
    " j表示特征，s表示特征取值的临界分隔值\n",
    " \n",
    " $$R_1 = \\{x|x^j <= s\\},R_2 = \\{x|x^j > s\\}$$\n",
    " \n",
    " c1和c2分别代表R1和R2样本平均的label值\n",
    " \n",
    " #### 树的构建流程\n",
    " \n",
    " - 遍历所有特征，特征的最佳划分的应的得分，选取最小的分的特征\n",
    " \n",
    " - 将数据根据此选取的特征分化成两部分\n",
    " \n",
    " - 在左右两部分中遍历找到划分特征知道满足条件为止\n",
    " \n",
    "### 分类树\n",
    " \n",
    "#### 函数表示\n",
    " \n",
    "与回归树一致\n",
    " \n",
    "#### 最有特征选取\n",
    " \n",
    "- 基尼系数\n",
    "\n",
    "## 数学原理和构建方法\n",
    "\n",
    "### 梯度提升树的数学原理\n",
    "\n",
    "集成学习，采用boosting的方式\n",
    "\n",
    "#### 提升树的函数表示\n",
    "\n",
    "$$f_M(x) = \\sum_{m=1}^M T(x;\\theta_m)$$\n",
    "\n",
    "M 表示有M颗决策树，$\\theta_m$ 表示决策树的参数\n",
    "\n",
    "#### 学习方式\n",
    "\n",
    "$$f_m(x) = f_{m-1}(x) +  T(x;\\theta_m)$$\n",
    "\n",
    "即，每次只学习一棵树的参数\n",
    "\n",
    "$$\\theta_m = arg min_{\\theta_m} \\sum_{i=1}^N L(y_i, f_{m-1}(x_i) + T(x_i;\\theta_m))$$\n",
    "\n",
    "损失函数可以采用平方误差和的方式\n",
    "\n",
    "$$ L(y, f+m(x)) = [y - f_{m-1}(x) - T(x;\\theta_m)]^2 $$\n",
    "\n",
    "以上每轮迭代的时候，上一轮以前模型都是已知，这样能保证每次都学习一棵树，每次学习都是拟合残差\n",
    "\n",
    "### 梯度提升树的构建流程\n",
    "\n",
    "- 初始化$f_0(x) = 0$\n",
    "\n",
    "- 对于m=1,2...M，计算残差$r_m$，拟合$r_m$，得到$T_m$\n",
    "\n",
    "- 更新$f_m = f_{m-1} + T_m$\n",
    "\n",
    "### 梯度提升树残差数值改变\n",
    "\n",
    "$$r_m = -[\\frac{\\partial L(y, f(x_i))}{\\partial f(x_i)}]_{f_m(x) = f_{m-1}(x)}$$\n",
    "\n",
    "即，损失函数的负梯度在当前模型的值\n",
    "\n",
    "## XGBoost数学原理和构建方法\n",
    "\n",
    "### XGBoost的函数表示\n",
    "\n",
    "$$f_M(x) = \\sum_{m=1}^M T(x;\\theta_m)$$\n",
    "\n",
    "$$f_m(x) = f_{m-1}(x) +  T(x;\\theta_m)$$\n",
    "\n",
    "$$\\theta_m = arg min_{\\theta_m} \\sum_{i=1}^N L(y_i, f_{m-1}(x_i) + T(x_i;\\theta_m)) + \\Omega(T_m)$$\n",
    "\n",
    "可以看到很多地方都是和提升树类似的只是添加了正则化项。\n",
    "\n",
    "- 在XGBoost中我们对优化目标进行泰勒展开：\n",
    "\n",
    "$$f(x+ \\Delta x) = f(x) + f'(x)\\Delta x + \\frac{1}{2}f''(x) \\Delta x^2$$\n",
    "\n",
    "$$min_{\\theta_m}\\sum_{i=1}^N[g_i T_m + 0.5 h_i T_m^2] + \\Omega(T_m)$$\n",
    "\n",
    "其中：\n",
    "$$g_i = \\frac{\\partial L(y_i,f_{m-1})}{\\partial f_{m-1}}, h_i = \\frac{\\partial^2 L(y_i,f_{m-1})}{\\partial f_{m-1}}$$\n",
    "\n",
    "- 正则化：\n",
    "\n",
    "XGBoost中都是回归树\n",
    "\n",
    "$$f(x) = \\sum_{j=1}^Q c_j I(x \\in R_j)$$\n",
    "\n",
    "$$\\Omega(T_m) = \\partial Q + 0.5\\beta \\sum_{j=1}^Q c_j^2$$\n",
    "\n",
    "- 优化目标转化：\n",
    "\n",
    "原目标\n",
    "\n",
    "$$min_{\\theta_m}\\sum_{i=1}^N[g_i T_m + 0.5 h_i T_m^2] + \\partial Q + 0.5\\beta \\sum_{j=1}^Q c_j^2$$\n",
    "\n",
    "转化：\n",
    "\n",
    "$$ \\sum_{j=1}^Q [(\\sum_{i \\in R}g_i)c_j + 0.5 (\\sum_{i \\in R} h_i + \\beta)c_j^2] + \\partial Q$$\n",
    "\n",
    "- 目标函数最优解：\n",
    "\n",
    "$$G_j = \\sum_{j \\in R}g_j, H_i =\\sum_{j \\in R} h_j $$\n",
    "\n",
    "$$ \\sum_{j=1}^Q [G_jc_j + 0.5 (H_j + \\beta)c_j^2] + \\partial Q$$\n",
    "\n",
    "$$c_j = -\\frac{G_j}{H_j+\\beta}, obj = -0.5\\sum_{i=1}^Q\\frac{G_j^2}{H_j+\\beta} + \\partial Q$$\n",
    "\n",
    "- 最优化分特征选取\n",
    "\n",
    "$$c_j = -\\frac{G_j}{H_j+\\beta}, obj = -0.5\\sum_{i=1}^Q\\frac{G_j^2}{H_j+\\beta} + \\partial Q$$\n",
    "\n",
    "$$Gain = (\\frac{G_L^2}{H_L + \\beta} + \\frac{G_R^2}{H_R + \\beta} + \\frac{(G_L + G_R)^2}{H_L + H_R + \\beta}) - \\partial$$\n",
    "\n",
    "### XGBoost 总体流程\n",
    "\n",
    "- 初始化$f_0(x) = 0$\n",
    "\n",
    "- 对于m=1,2...M，应用最优化分构造树\n",
    "\n",
    "- 更新$f_m = f_{m-1} + learn\\_rating * T_m$"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
